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ABSTRACT 


Adaptive/general learning algorithms using varying neural network 
models are considered for the intelligent control of robotic arm 
plus dextrous hand/manipulator systems. Results are summarized and 
discussed for the use of the Bar to /Sut ton/ Anderson neuronlike* un- 
supervised learning controller as applied to the stabilization of an 
inverted pendulum on a cart system. Recommendations are made for the 
application of the controller and a kinematic analysis for trajec- 
tory planning to simple object retrieval (chase/approach and cap- 
ture/grasp) scenarios in two dimensions. 


i 


Over v i ew 


INTRODUCTION 


The research work reported herein is important to the future 
development of the NASA/JSC EVA Retriever. This highly autonomous* 
f ree --flying robot or robotic system is comprised of MMU * arm and 
smart hands. It is being developed to aid crewmen in the perfor- 
mance of EVA tasks including the chase* capture and return capabili- 
ty required for adrift crewmen or station equipment. The ultimate 
goal of the work in developing this system is to enhance the effec- 
tiveness of EVA crewmen II, 233 . 

The intelligent control of robotic ar m/hand systems using 
neural network learning controllers is very relevant to EVA Retrie- 
ver development. This follows because of the need for autonomous* 
adaptive behavior in both planned and unplanned contexts in the 
space environment. Neural networks and related advanced learning 
controllers offer such capabilities C233. 

The work reported herein is concerned with the i nvest i ga t i on 
and development of neural networks or other types of advanced 
learning controllers as: 

(a) Supervised controllers with training which because of 
their connective, associative memory structure can 
develop significant controller generalization capabili- 
ty. Such generalization can lead to similar performance 
of the retriever ar m/hand controller in different but 
analogous physical system situations and in stochastical- 
ly related loading/excitation environments. 

<b) Unsupervised controllers which can self train/adapt to 
new learning situations and also exhibit significant 
gener al i zat ion capability. As learning develops* and un- 
familiar situations become familiar ones, these neural 
networks should provide feedforward compensation with 
less compensation via the feedback path 17, 11, 15, 263. 
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Neural Networks for Intelligent Control 

Neural networks are massively parallel, distributed processing 
systems. They have the ability to continuously improve their perfor- 
mance via dynamic learning C7, 9, 15-18, 363. As used in this re- 
port, neural networks refers to "artif icial" , i.e., programmable 

systems of processing elements. As such they form a research area 
of intense interest in artificial intelligence. 

Initial neural network research concentrated on the computa- 
tionally intensive areas of adaptive signal processing, as, e.g., 
pattern recognition, real-time speech recognition and image inter- 
pretation. Recently there has been a resurgence of interest in 
neural networks because of (a) Advances in training algorithms for 
networks, and (b) Availability of extremely fast, relatively inex- 
pensive computers for implementing these algorithms. These develop- 
ments have lead to the consideration of neural networks for the 
real-time i den t i f i c a t i on and control of large flexible/articulated 
aerospace and robotic systems [7, 87, 883. 

Neural networks can provide mechanisms for (a) Associative 
memory, (b) Pattern recognition, and <c) Abstraction. These are 
emergent properties of networks of neuronlike units with adaptive 
synaptic connections CIO, 1 4- , 88, 89, 383. These mechanisms arise 
from the neural network being a system of interconnected "neuron- 
like” elements modeled after the human brain. This system operates 
on input data in an "all at once” mode rather than in a conventional 
computer’s "step by step” algorithmic approach [7, 9, 893. Differ- 
ent learning architectures can be used in training for intelligent 
control. This is done to provide appropriate inputs to the system so 
that the desired responses are obtained. Uncertainty and noise can 
be handled by a neural network via the Hebbian type of associative 
learning arising from adaptively modified connection strengths [21, 
893. Kawato el al C15-183 indicate that a neural network model can 
be used to control voluntary movement with applications to robotics. 
Implemented as a multilayered, h i er arch i c a 1 1 y intelligent control 
system, neural networks can be implemented to effect the following: 

(a) Pattern recognition/ condition matching 

(b) Trajectory and approach, grasping, etc. opertation 

<c) "Point of view” transformations - as, e.g., visual to 

sensor/end effector to object, etc. 

(d) System (robot, object, etc.) state observer or model 
synthesis and simulation behavior 

(e) Generation of mo t i on/ac tua tor commands. 

Adaptive control is useful for systems which perform over the 
large ranges of uncer t a i nt i es which result from large variations in 
physical and operating parameter values, environmental conditions, 
and signal inputs. However, adaptive control as such (i.e., without 
unsupervised 1 ear n i ng /unant i c i pa ted problem solving features) has 
difficulty with the fallowing generic problems in designing 
controllers: 

* Sensor data overload - arising from (a) Data redundancy 
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per se* and (b) Specialized, rarely required data 

* Nu 1 t i -spec tr al , multi-sensor data fusion and mapping/use 
in the proper feedback control law 

* Need for system robustness to handle large parameter 
excursi ons 

* Required h i gh~speed , real-time control degradation resul- 
ting from time consuming artificial intelligence calcula- 
tions 

* Unsolved sensor choice and placement problems for 
r obo t i c / 1 ar ge control systems. 

It should be noted that human intervention is used in traditional 
control systems operating with large uncertainty. Such interven- 
tion is unacceptable in many real-time applications. This is espec- 
ially true for the hostile space environment in which the NASA EVA 
Retriever is to operate Cl, 233. It means that automatic techniques 
for handling uncertainty must be developed. Neural networks show 
great promise for the intelligent, unsupervised control of the mul- 
tiple arm plus dextrous robotic hands of the Retriever. The next 
section of the report describes the author’s research work with the 
Barto et al intelligent controller which is a special kind of neural 
network with associative search and associative critic neuronlike 
e 1 events . 

ACE/ASE NEURONLIKE LEARNING CONTROLLER 

The Bar to/Sutton/Anderson adaptive learning controller is 
composed of two types of neuronlike elements with significant 
unsuper’v i sed problem-solving capacities. These elements are the 
associative search element ( ASE ) and adaptive critic element < ACE > . 
Barto et al 1983 used a single element of each type. Their ASE 
element exhibits a learning strategy which is similar to the earlier 
"BOXES" adaptive problem solving system of Michie and Chambers 
C29, 253. The ASE/ACE elements embody refinements discussed in the 
literature by Barto and colleagues £2-6, 30-313. They evolved from 
the heterostatic brain function and adaptive systems work of Klopf 
£19, 203. Adding a single ACE element improves the learning perfor- 

mance over that of a single ASE alone. This can be clearly shown by 
comparing the problem-solving capabilities of BOXES with those of a 
single ASE/single ACE learning system and solving the control prob- 
lem of balancing an inverted pendulum on a cart. It is interesting 
to note that strong analogies exist between the behavorial inter- 
pretations of the ASE, ACE adaptive elements and animal behavior in 
instrumental learning. There are also strong parallels with the 
"bootstrap adaption" systems work of Widrow et al £33-353. This 
work considered the (a) pun i sh / rewar d critical learning and (b) pat- 
tern recognizing control problems. Relevant artificial (i.e., 
programmable) neural networks the ASE, ACE neuronlike elements are 
significant. This follows because they indicate that if adaptive 
elements are to learn effectively as network components, then they 
are constrained to have adaptive capabilities at least as robust as 
these Barto et al learning controller elements £23. 

Figure 1 depicts the inverted pendulum on a cart system which is 
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Figure 1. Representative Model for Cart and Inverted Pendulum 
System . 
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Figure 2. ASE and ACE Controller for Cart Plus Inverted Pendulum 
System 
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to be controlled. Here the cart can move within the bounds 
indicated on a one-dimensional track. The pendulum can move only in 
the vertical plane of the cart and the track. The applied force 
F ^ t ) results from the output of the learning controller. It is 
applied in a bang-bang <+/-) manner and acts with a fixed magnitude 
to the left or right at discrete time intervals. The pendulum-cart 
system is described by a four state variable model in the time 
domain C83. The four state variables are as follows: (a) y(c. - the 
position of the cart on the track, (b)^p - the angle of the 
pendulum with the vertical, (c) X'c - the cart velocity, and (d) 

0p - the rate of change of the pendulum angular displacement. The 
state variable model for this system can be written as 



CMp*Ly*Co 
Opy'y CJp 




5(0 P )J 

+Mp*l^3 


EMp*ly>si n < & p ) * 
|C-Mp*g*y*sin< © 


^c*sgn< Vj») 
p> - Mp*u ) P : 


+ Fapp 




( 1 ) 


Physical parameters in the above equations specify pendulum length 
and mass, cart mass, the coefficients of friction between the cart 
and the track and at the pin connection between the pendulum and the 
cart, the applied control force, the force due to gravity, and time. 
Table 1 defines the notation used in equation 1. 

The system of first order equations has been solved using second 
order numerical intergration procedures which have been implemented 
in the FORTRAN computer program NRLNET. In implementing the 
learning controller algorithm the state space has been partitioned 
based on the following 252 quantization interval thresholds: 

< 1 ) 


( 2 ) 


(3) 

i <4> 

Figure 2 depicts the ASE plus ACE adaptive learning controller 
of Barto et al C23. The neuronlike learning system can be described 
; by the following equations: 

| Element output v(t) which is determined from the decoded stats 

1 quantization interval vector input 

i y = f L ( w ( I , t ) * x ( I , t ) ) + n < t ) 3 (2) 

j 

i 
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Cart position ■tfc : +/- 0.8, +/- 2.4 m ,(4 quantiza- 
tion intervals including failed regions above and 
below 2.4m) 

Pendulum angular displacement : 0,+/- 1, +/- 6, 

+/- 12 degrees, (7 quantization intervals including 
failed regions above and below 12 degrees) 

Cart velocity yc : +/- 0.5, + /-0^m/s, (3 quantiza- 
tion intervals) * 

Pendulum angular velocity ^p: +/- 50, + /- OO degrees 
per second, <3 quantization intervals) 



ORIGINAL PAGE i$ 

OF POOR QUALITY 

Here the noise n(t) is a real random variable with probability 
function p < x ) and f is either a threshold, sigmoid, or identity 
transfer function. For the work reported herein, p(x) is the zero 
mean Gaussian distribution with standard deviation O* , and f is 
the bang-bang type thr esho 1 d func t i on ; 

f(x) = + 1 , x .EG. 0 (applied force action 

to the r ight ) ( 3 ) 

-1, x .LT. 0 (applied force action 
to the left) 

ASE weights w(I,t) , 1 .LT. I .LT. N which change over discrete time 

as f o 1 lows : 


w(I,t+l) = w(I,t) + ALPHA *- r ( t ) * e(I,t) 


(4) 


In equation <4: 
ALPHA 

r ( t ) 


positive constant determining the rate of 
change in w ( I , t ) 

real-valued reinforcement at time t 


e(I»t) = eligibility at time t via the input pathway I. 


Eligibility traces for the ASE weights which exponentially decay 
with increasing time, given in equation 5 as: 

' e ( I , t+ 1 ) = DELTA * e(I,t) + (1-DELTA) * y(t) * x<I,t> (5) 


in which. 


DELTA = the eligibility decay rate. 


ACE weights v(I,t) . 1 . LT . I . LT . N which change over discrete t i me 

as foil ows : 


v( I , t+1 ) = v(I,t> + BETA * rhat(t) * xbar ( I , t ) (6) 

In equation 6, 

BETA = positive constant defining the rate of change 

of v < I , t ) 

rhat(t) = r(t) + GAMMA * p(t) - p(t-l), the improved 
internal r e i nf or cement signal for the critic 
e 1 ement 


xbar ( I , t ) 


p ( t ) 


= LAMBDA * xbar(I,t) + (1-LAMBDA) * x(I,t), 
the eligibility traces for the ACE weights 

= I , t ) * y(I,t), the prediction of eventual 

reinforcement 
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GAMMA 


reinforcement learning rate 


LAMBDA = xbar<I,t) trace delay weight 

Barto and Sutton C2, 5] explain the derivation of the ACE learning 
rule as used above. Additional discussion of the ASE , ACE adaptive 
learning controller can be found in references C30, 313. 

ASE/ACE LEARNING CONTROLLER RESULTS 

This section of the report discusses representative results 
obtained by the author with his FORTRAN computer program NRLNET131 
implementing the Barto et al ASE/ACE neuronlike learning control- 
ler. This program is the result of several modifications by the 
author to incorporate general data file input and the file and prin- 
ter plot output of the applied control force and the four state 
variables as functions of time. The original FORTRAN program 
NRLNETOO was the author’s implementation of a PASCAL program written 
in 1988 by Doug Walker of GHG in support of the Special Projects 
Branch ( EC5 ) in the Crew and Thermal Systems Division at NASA/JSC. 


TABLE 1. SUMMARY OF PHYSICAL PARAMETER VALUES FOR CART PLUS INVERTED 
PENDULUM SYSTEM 


Me 

Mp 

Lp 

MUc 

MUp 

Fapp 


Cart Mass, 1.0 kg 
Pendulum Mass, 0.10 kg 
Pendulum Length ( 0.50 m 

Cart Coef f i cient of Coulomb Friction, 0. 005 
Pendulum/Cart Pin Coefficient of Friction, 0.00002 
N m sec/rad 

Magnitude of Force Applied to Cart in x Direction, 
(+/-) ION 


TABLE 2. SUMMARY OF THE ASE/ACE NEURONLIKE LEARNING CONTROLLER 
PARAMETERS 


ALPHA 

BETA 

DELTA 

GAMMA 

LAMBDA 

M 


<r 


Rate Constant for ASE Weights? 1000.0 
Rate Constant for ACE Weights? 0.50 
Decay Rate for ASE Eligibility Traces? 0.90 
Learning Rate for Improved Internal Reinforce- 
ment ? 0 . 95 

Decay Rate for ACE Eligibility Traces, 0.95 
Mean Value for Gaussian Normal Distribution Used 
to Define ASE/ACE Output Noise Function? 0.00 and 
0. 10 

Standard Deviation Value for Gaussian Normal Dis- 
tribution Used to Define ASE/ACE Output Noise 
Function? 0.01? 0.05? 0.10? 0.15? 0 . £0 ? and 0 . £5 


Table 1 gives the physical and control parameter values used in 
the simulation work with NRLNET131 for the cart plus inverted pendu- 
lum system depicted in Figure 1. Values used for the ASE/ACE neuron- 
like learning controller parameters are summarized in Table £. These 
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Figure 3. Example Simulation Re- 
sults Showing Learning 
Performance of ASE/ACE 
Learning Controller 
System 
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Figure 4. Average Number of Trials 
for Five Runs as a Func- 
tion of Standard Devia- 
tion With Mean Value as 
Par ameter 
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Figure 5. Applied Force Fapp(t)> N 
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Figure 8. Pendulum Angular Dis- Figure 9. Pendulem £ngular 

p 1 acement t ) » r ad Velocity < t ) » rad /sec 
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parameter values were used to generate the simulation performance 
results plotted in Figures 3 and 4. Figure 3 plots curves for the 
number of time steps until failure versus the trial number. These 
are typical curves for individual runs of the ASE/ACE learning con- 
troller system. Figure 4 gives plots of the average number of trials 
which are required to stabilize the cart plus pendulum system for 
5000 time steps (100 seconds with t = 0.02 seconds). The average 
number of trials are given as a function of the standard deviation 
for the Gaussian normal random noise process with the mean for the 
process as parameter. The mean equal zero curve indicates a trend 
toward an increasing number of trials as the standard deviation is 
increased from 0.01 to 0.25. The other curve for the mean equal to 
0.10 shows relative constancy over the same range in standard devia- 
tion. These runs were originally made to examine the sensitivity of 
the ASE/ACE learning controller performance to variation in the 
noise process used in generating its output function. An additional 
objective was to develop a base from which the generalization and 
robustness properties of the controller weights could be investi- 
gated. Five runs were used to generate each point plotted in Figure 
4. The results shown in Figures 3 and 4 are in general agreement 
with those published by Barto et al C23. However, the author has 
found that his NRLNET131 implementation of the ASE/ACE controller 
usually takes a lesser number of trials for successfully learning to 
stabilize the cart plus pendulum system for both the 5000 ^ t (100 
seconds) cases shown here and the 200,000 4 t (66.7 minutes) cases 
which the author ran to directly compare his results with those of 
Barto et al. Extensive runs were not made for the 200,000 A t (66.7 
minutes) stabilization period because of the excessively long 
elapsed* time rerequired for the VAX system available to the author 
to return answers for a single run. 

Figure 5 shows the controller output force which is applied to 
the cart in stabilizing the inverted pendulum. Here the applied 
force is plotted as a function as a function of time over the first 
100 4 1 intervals (2 seconds). Extensive runs have been made with the 
ASE/ACE controller system and all exhibit the characteristic + /- 10N 
on— off or bang— bang behavior with At = 0.02 sec. This value of the 
time increment should be adequate, based on physical system oscilla- 
tion behavior, for the second order numerical integration scheme 
used . 

Figures 6-9 plot the four state variables: cart displacement 
, cart velocity ijCj * pendulum angular displacement <<9) , and 
pendulum angular velocity , respec t i vel y . They are also plotted 

as functions of time over the first lOO^At intervals (2 seconds). 
Consideration of these and similar time domain results for the state 
variables and the applied force indicates that (a) significant in- 
efficiencies can occur with respect to the input force and its im- 
pact on the actual state variable behavior of the cart plus pendulum 
system, (b) with^ t = 0.02 sec there may be some interaction be- 
tween the numerical integration method used and the dynamics of the 
ASE/ACE learning controller. To investigate (b) above, additional 
runs were made in which Al t was reduced ( At = 0.01 , 0.005, 0.001 
sec). These results although not included here did show significant 
reduction with decreasing At in the bang-bang nature of the input 
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force and the higher frequency oscillations present in the state 
behavior over time (especially for the cart linear and the pendulum 
angular velocities) . 

The author has extended the single ASE/single ACE learning con- 
troller system to include two search and two critic elements. The 
elements in each pair work in parallel. Since the outputs are aver- 
aged in the 2 ASE/2 ACE learning controller system, it has -Fapp , 0, 

+Fapp as three possible outputs. This extension was implemented in 
the author '5 FORTRAN computer program NRLNET20. Initial runs indi- 
cate that the new controller as implemented has good performance up 
to a maximum learning point (maximum time for stability as a func- 
tion of number of trials). Beyond this point the learning is severe- 
ly degraded with increasing trials, or a form of limit cycle behav- 
ior occurs. These results indicate that the split-decision nature of 
the 2 ASE/2 ACE system as implemented in its averaging form may 
cause the observed behavior. In this case using a 3, 5? etc. ( i .e. ? 
odd number of elements) in the ASE/ACE system may be warranted. 

These controllers would also have a “smoother" (i.e., less hang- 
bang) control action. Another alternative to improve performance is 
to more richly connect the elements both within and across the . 
search element and the critic element layers. This would give the 
ASE/ACE neuronlike controller system a counter pr opagat i on/Gr ossberg 
layer plus Kohonen layer type of neural network structure C12, 133. 

CONCLUSIONS 

An examination has been made of the use of neural networks for 
the intelligent control of robotic arm-plus hand /man i pu 1 ator sys- 
tems for the EVA Retriever. Based largely to the present time on a 
review of the literature and computer simulation work, this examina- 
tion has indicated that a hierarchical , multi-layer neural network 
system can be used for intelligent control. Baseline feedforward 
control is used in conjunction with trajectory planning in these 
systems. Joint torque feedback provides the correction signal. These 
systems have the character ist ic that as additional response behav- 
iors are learned, much of the control action passes to the feedfor- 
ward path. 

Additional investigation into neural networks for intelligent 
control has focused on the use and extension of the Barta et al 
neuronlike ASE/ACE intelligent controller. A FORTRAN family of com- 
puter programs ( NRLNETXX ) were developed by the author as extensions 
of a previous Pascal language implementation of the controller at 
NASA/JSC. Work with these programs has concentrated on the follow- 
ing: (a) Verifying published results for convergence to stable solu- 

tion (number of trials for a specified period of stability), (b) 
Developing graphics, etc. feedback tools to monitor system behavior 
(as, e.g., given by the applied control force and the four state 
variables as functions of time), (c) Investigate learning control 
behavior as a function of the number of unsupervised trials required 
to obtain stability and the random process parameters (Gaussian pro- 
cess mean and standard deviation), and <d) Basic extensions to the 
learning controller network i ncorpora t i ng two adaptive search ele- 
ments (ASEs) and 2 adaptive critic elements (ACEs) in its structure. 
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RECOMMENDATIONS 


This section of the report presents recommendat i ons for the 
intelligent control of smart robotic arm plus hand systems using 
neural networks. These recommendations are based on the results 
presented above and on additional related work done by the author 
during Summer 1988. They are presented in the form of a research 
and development program plan. The R & D program plan gives activ- 
ities that can continue the author’s research begun during the 1988 
summer program. 

(1) Investigation of two dimensional graphics as a kinematic 
si mul a t ion tool for planning EVA object retrieval in terms 
of the approach to and grasping of objects using an artic- 
ulated two-link arm/scissor hand system. 

(2) Implementation of dynamics* sensing* and control models of 
the articulated two-link arm/scissor hand system. It is 
desired to mount this arm/hand system on a cart to repre- 
sent the EVA Retriever in two dimensions. 

( 3 ) Examination of hierarchical neural networks with fuzzy 

logic reasoning as adap t i ve/gener a 1 learning systems com- 
prised of (a) Network ar ch i tec tures * (b) Transfer func- 

tions* and (c) Dynamic learning rules. These systems can 
employ joint torque and state vector feedback to control 
the arm/hand system(s) in object retrieval as discussed 
above . 

(A) Investigation of extensions to the Bar to /Anderson neuron- 
like learning system and counter propagation/back propaga- 
tion type networks to the related problem of stabilizing/ 
controlling the motion of simple and compound (articulated 
linkage) pendulums on a cart. Successful employment here 
can lead to similar use with the arm/hand retriever sys- 
tems. 
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